Bayesian image denoising method based on distribution constraint of noisy images

ABSTRACT

Embodiments of the present disclosure provide a Bayesian image denoising method based on a distribution constraint of noiseless images from a distribution of noisy images. The method includes: determining a distribution constraint of noiseless images from a distribution of noisy images; and performing Bayesian denoising on noisy images based on the distribution constraint of noiseless images.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and benefits of Chinese PatentApplication Serial No. 202010696129.6, filed on Jul. 20, 2020, theentire disclosure of which is incorporated by reference herein.

FIELD

The present disclosure relates to the field of digital image processingand computer vision technologies, and more particularly, to a Bayesianimage denoising method based on a distribution constraint of noiselessimages from a distribution of noisy images.

BACKGROUND

In the process of image acquisition, such as photography, public safetymonitoring, medical imaging, and microscopic imaging, due to factorssuch as the random characteristics of physical signals, environmentalinterference and the own state changes of systems, the collected imagesinevitably contain noise, which significantly affects the image quality.Image denoising is an important method and technology in imageprocessing, which aims to reduce the random noise and restore theeffective information of images.

Classical image denoising methods include mean filtering, medianfiltering, Gaussian low-pass filtering and other filtering methods.These filters limit the noise related components of the image to achievethe effect of noise removal. The filtering method may be implemented inboth the image domain and the transform domain after a specifictransformation on the image, such as the frequency domain, waveletdomain, etc. However, the filtering method has an inherent disadvantagein preserving image details. In addition to the analytical filteringmethod, the image iterative optimization method based on Bayesianposteriori probability distribution is another kind of image denoisingmethod, which can remove noise by introducing additional penalty termssuch as total variation minimization and Markov random field potentialenergy function minimization. The Bayesian posterior estimation methodcan produce a more accurate noiseless image estimation, but the resultis significantly dependent on the penalty term imposed.

In recent years, with the development of artificial neural network anddeep learning technology, the application of deep learning method toimage denoising has significantly improved the effect of imagedenoising. Compared with the classical methods, the image details can berecovered more accurately while removing noise. Although a variety ofneural networks in many applications show strong image denoising andimage restoration capabilities, they are all dependent on a large numberof training datasets. Supervised learning requires paired noisy imagesand noiseless images as truth values for network training. Generativeadversarial learning can extract the distribution rules and features ofnoiseless images from the unpaired noiseless image dataset, and then theimage denoising is performed, eliminating the requirement of datapairing. However, in most applications, real noiseless data is oftendifficult to obtain or expensive to acquire, which makes it difficultfor deep learning in the application of image denoising.

SUMMARY

Embodiments of the present disclosure provide a Bayesian image denoisingmethod based on a distribution constraint of noiseless images from adistribution of noisy images. The method includes: determining adistribution constraint of noiseless images from a distribution of noisyimages; and performing Bayesian denoising on noisy images based on thedistribution constraint of noiseless images.

Additional aspects and advantages of the present disclosure will begiven in the following description, and some of them will becomeapparent from the following description or be known through the practiceof the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of a Bayesian image denoising method based on adistribution constraint of noiseless images from a distribution of noisyimages according to an embodiment of the disclosure.

FIG. 2 is a schematic diagram of an image denoising neural network andits cost function in Embodiment 2 of the disclosure.

FIG. 3(a)-(b) is a schematic diagram of an implementation of a neuralnetwork structure involved in Embodiment 2 and Embodiment 3 of thedisclosure, where (a) is an image denoising neural network and (b) is aneural network discriminator.

FIG. 4 is a schematic diagram of the image denoising neural network andneural network discriminator of Embodiment 3 of the disclosure and theircost functions.

DETAILED DESCRIPTION

Embodiments of the present disclosure provide a Bayesian image denoisingmethod based on a distribution constraint of noisy images. Asillustrated in FIG. 1 , the method may include followings.

At block 101, a distribution constraint of noiseless images isdetermined from a distribution of noisy images.

At block 102, Bayesian denoising is performed on noisy images based onthe distribution constraint.

The distribution constraint of the noiseless images may be determined asfollows.

For images of which the distribution can be accurately or approximatelyexpressed analytically, the distribution constraint of noiseless imagesis estimated using a characteristic function method as follows.

(1) a probability density function ƒ_(Z) of a random vector Z for noisyimages generated by an imaging system is estimated analytically, and aprobability density function ƒ_(N) of a random vector N for additivenoise of the imaging system is estimated analytically.

(2) Fourier transformation is performed on the probability densityfunctions ƒ_(Z) and ƒ_(N), to obtain the characteristic function F_(Z)of the random vector Z and the characteristic function F_(N) of therandom vector N.

(3) a characteristic function F_(S) of a random vector S for noiselessimages is obtained based on the characteristic function F_(Z) of therandom vector Z and the characteristic function F_(N) of the randomvector N according to a formula of F_(S)=F_(Z)/F_(N).

Alternatively, with the objective optimization method, according to theconsistency between the characteristic function F_(S) of andF_(Z)/F_(N), the estimation value F_(S)* of the characteristic functionF_(S) is obtained by:

${F_{S}^{*} = {\underset{F_{S}}{argmin}\mspace{11mu}{\Phi( {F_{S},{F_{Z}/F_{N}}} )}}},$

where Φ(g, g) is a distance measure between the characteristic functionsof two random vectors, and argmin represents optimizing parameter F_(S)so that the objective function is minimized.

(4) inverse Fourier transformation is performed on the characteristicfunction F_(S) or the estimation value F_(S), to obtain a probabilitydensity function ƒ_(S) of the random vector S, i.e., a distribution ofthe random vector S is obtained.

(5) analytical filtering denoising or Bayesian iterative denoising isperformed on a set of noisy image samples z₁ z₂, z₃, . . . , z_(n), toobtain initial estimation of the set Ŝ for estimated noiseless imagesample ŝ₁, ŝ₂, ŝ₃, . . . ŝ_(n).

(6) a distribution distance D(Ŝ,S) between the distribution of set S andthe distribution of the random vector S is calculated according to adistribution distance function D, and the distribution distance D(Ś,S)is determined as the distribution constraint of noiseless images forestimating noiseless images.

The distribution distance D(Ś,S) is calculated using the method of aneural network discriminator, which includes followings.

(6-1) Assume that the distribution distance function D contains a neuralnetwork discriminator Ω_(D) as a distribution distance operator, inwhich the neural network discriminator has a parameter set θ_(D).

(6-2) the distribution of the set Ŝ and the distribution of randomvector S are used to compute distribution distance D(Ś,S) where thedistribution distance function D is defined by a cross entropy distancefunction, which is realized by the neural network discriminator Ω_(D):

${D( {\overset{\hat{}}{S},S} )} = {{\sum\limits_{i = {1:n}}{\log( {\Omega_{D}( {s_{i};\theta_{D}} )} )}} + {\log( {1 - {\Omega_{D}( {{\overset{\hat{}}{s}}_{i};\theta_{D}} )}} )}}$

or a Wasserstein distance function:

${{D( {\overset{\hat{}}{S},S} )} = {\sum\limits_{i = {1:n}}{{{\Omega_{D}( {s_{i};\theta_{D}} )} - {\Omega_{D}( {{\overset{\hat{}}{s}}_{i};\theta_{D}} )}}}}};$

where, s₁, s₂, s₃, . . . , s_(n) are n samples from the distribution ofthe random vector S for noiseless images.

Alternatively, the distribution distance function in other generativeadversarial network methods in the generative adversarial network fieldcan be adopted. The generative adversarial network methods in thegenerative adversarial network filed is any one of Standard GAN (SGAN),Region-Separative GAN (RSGAN), Relativistic average Standard GAN(RaSGAN), Least Squares GAN (LSGAN), Relativistic average Least SquaresGAN (RaLSGAN), Hinge-loss GAN (HingeGAN) Wasserstein GAN with GradientPenalty (WGAN-GP), Region-Separative GAN with Gradient Penalty(RSGAN-GP) or Relativistic average Standard GAN with Gradient Penalty(RaSGAN-GP).

For images of which the distribution is difficult to be expressedanalytically, or for images of which the distribution calculation is toocomplex, the distribution distance is estimated using indirectdistribution constraint method, which includes followings.

(1) analytical filtering or Bayesian iterative denoising is performed ona group of noisy image samples z₁, z₂, z₃, . . . , z_(n) generated by animaging system to obtain initial estimation of the correspondingestimated noiseless image samples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n).

(2) a probability density function ƒ_(N) of a random vector N foradditive noise of the imaging system is estimated analytically, andnoise samples n₁*, n₂*, n₃*, . . . , n_(n)* are sampled from thedistribution of the random vector N, and the number of the noise samplesn₁*, n₂*, n₃*, . . . , n_(n)* is the same as that of the noisy imagesamples z₁, z₂, z₃, . . . , z_(n).

Alternatively, when there is no signal, the noise samples n₁*, n₂*, n₃*,. . . , n_(n)* of the imaging system are collected practically.

(3) the noise samples n₁*, n₂*, n₃*, . . . , n_(n)* and the estimatednoiseless image samples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n) are added to generateestimated noisy image sample {circumflex over (z)}₁, {circumflex over(z)}₂, {circumflex over (z)}₃, . . . , {circumflex over (z)}_(n), inwhich {circumflex over (z)}_(i)=ŝ_(i)+n_(i)*, i=1:n.

(4) The distribution distance D({circumflex over (Z)},Z) between a set{circumflex over (Z)} for the estimated noisy image samples {circumflexover (z)}₁, {circumflex over (z)}₂, {circumflex over (z)}₃, . . .{circumflex over (z)}_(n) and a set Z for the noisy image samples z₁,z₂, z₃, . . . , z_(n) is calculated according to a distribution distancefunction D, and the distribution distance D({circumflex over (Z)},Z) isdetermined as distribution constraint of noiseless images.

The distribution distance D({circumflex over (Z)},Z) is calculated usingthe method of a neural network discriminator, which includes followings.

(4-1) Assume that the distribution distance function D contains a neuralnetwork discriminator Ω_(D) as a distribution distance operator, inwhich the neural network discriminator Ω_(D) has a parameter set θ_(D).

(4-2) the set {circumflex over (Z)} and the set Z of real noisy imagesamples are inputted into the distribution distance function D, and theneural network discriminator Ω_(D) is used to calculate a cross entropydistance function between the distribution of the set {circumflex over(Z)} and the distribution of the set Z:

${D( {\overset{\hat{}}{Z},Z} )} = {{\sum\limits_{i = {1:n}}{\log( {\Omega_{D}( {z_{i};\theta_{D}} )} )}} + {\log( {1 - {\Omega_{D}( {{\overset{\hat{}}{z}}_{i};\theta_{D}} )}} )}}$

or a Wasserstein distance function:

${D( {\overset{\hat{}}{Z},Z} )} = {\sum\limits_{i = {1:n}}{{{{\Omega_{D}( {z_{i};\theta_{D}} )} - {\Omega_{D}( {{\overset{\hat{}}{z}}_{i};\theta_{D}} )}}}.}}$

Alternatively, the distribution distance function in other generativeadversarial network methods in the generative adversarial network fieldcan be adopted. The generative adversarial network methods in thegenerative adversarial network filed is any one of Standard GAN (SGAN),Region-Separative GAN (RSGAN), Relativistic average Standard GAN(RaSGAN), Least Squares GAN (LSGAN), Relativistic average Least SquaresGAN (RaLSGAN), Hinge-loss GAN (HingeGAN) Wasserstein GAN with GradientPenalty (WGAN-GP), Region-Separative GAN with Gradient Penalty(RSGAN-GP) or Relativistic average Standard GAN with Gradient Penalty(RaSGAN-GP).

A probability density function ƒ_(N) of a random vector N for additivenoise of the imaging system is estimated analytically includes:obtaining a distribution form of the random vector N for additive noiseaccording to physical laws of the imaging system, and obtaining theprobability density function ƒ_(N) of the random vector N by measuringdistribution parameters of the random vector N through simulation oractual experiments.

Alternatively, a probability density function ƒ_(N) of a random vector Nfor additive noise of the imaging system is estimated analytically:Gaussian mixture distribution is used to fit the unknown distribution.First, assume that the random vector N for additive noise obeys aGaussian mixture distribution, and then the probability density functionƒ_(N) of the random vector N for additive noise is obtained by measuringGaussian mixture distribution parameters through simulation or actualexperiments.

A probability density function ƒ_(N) of a random vector N for additivenoise of the imaging system is estimated analytically includes:transforming a distribution of non-additive noise into an approximateadditive noise through a specific mathematical transformation, and thenthe probability density function ƒ_(N) of the random vector N foradditive noise is obtained.

In above methods, a probability density function ƒ_(Z) of a randomvector Z for noisy images generated by an imaging system is estimatedanalytically includes: obtaining the probability density function ƒ_(Z)of the random vector Z for noisy images through frequency statistics ofa large number of noisy image samples.

Alternatively, a probability density function ƒ_(Z) of a random vector Zfor noisy images generated by an imaging system is estimatedanalytically includes: obtaining a distribution form of the randomvector Z for noisy images using a known distribution model assumption,and estimating the probability density function ƒ_(Z) of the randomvector Z for noisy images by measuring distribution parameters of therandom vector Z through simulation or actual experiments.

Bayesian denoising is performed on noisy images based on thedistribution constraint as follows.

(1) a Bayesian posterior cost function Ψ is constructed for noisy imagesamples z₁, z₂, z₃, . . . , z_(n) and estimated noiseless image samplesŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n):

${\psi( {{\overset{\hat{}}{s}}_{1},{\overset{\hat{}}{s}}_{2},{\overset{\hat{}}{s}}_{3},\ldots,{\overset{\hat{}}{s}}_{n}} )} = {{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}} + {\lambda D}}$

the first term φ_(f) of the Bayesian posterior cost function Ψ is a datafidelity term between noisy image samples z₁, z₂, z₃, . . . , z_(n) andthe estimated noiseless image samples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n), thesecond term D is the distribution constraint, that is, D(Ŝ,S) or thedistribution distance D({circumflex over (Z)},Z), λ is a hyperparameteradjusting the importance of the distribution constraint.

(2) the Bayesian posterior cost function Ψ is minimized to obtain finalestimated noiseless images ŝ₁, ŝ₂, ŝ₃, . . . ŝ_(n), thereby realizingimage denoising.

The Bayesian posterior cost function Ψ is minimized using a classicalmathematical optimization method, including any one of a Lagrangemultiplier method, an alternate iteration method, an alternativeconjugate least square method or a gradient optimization algorithm.

Alternatively, The Bayesian posterior cost function Ψ is minimized usinga neural network optimization method, which includes followings.

(1) an image denoising neural network Ω_(E) is constructed as an imagedenoising operator ŝ_(i)≡Ω_(E)(z_(i); θ_(E)), i=1:n, where θ_(E) is aparameter set of the image denoising neural network.

(2) taking the Bayesian posterior cost function Ψ as the cost functionto train the image denoising neural network, the image denoising neuralnetwork is trained using a dataset with a large number of noisy imagesamples.

(3) the noisy images are inputted into the image denoising neuralnetwork Ω_(E), and the estimated noiseless images are outputted asBayesian denoising results.

The distribution distance function D is an analytical distributiondistance function. The analytic distribution distance function includesa distribution distance function of K-L divergence or J-S divergence.

Deep learning shows good performance in the application of imagedenoising. Compared with the classical filtering methods, the effect issignificantly improved, but the deep learning method relies heavily onthe noiseless image dataset. The requirement for noiseless datasetslimits the implementation and application situations of deep learning inreal image denoising. In Bayesian posterior probability theory, theestimation of noiseless images from noisy images depends on the modelingof the prior distribution of noiseless images. The disclosure firstproposes a method of learning the distribution of noiseless images fromthe noisy image samples under the condition that the distribution modelof additive noise is known. The disclosure further proposes a method oftransforming the distribution constraint of the noisy images into thedistribution constraint of noiseless images for Bayesian denoising. Thedisclosure also proposes a neural network optimization method for Bayesdenoising which trains the image denoising neural network in anunsupervised way. The disclosure is to make full use of noisy imagesamples to accurately learn the implicit distribution features ofnoiseless images, and then realize efficient image denoising.

The images in the method of the disclosure refers to two-dimensional orthree-dimensional signals in general, including visible images, clinicalmedical images, microscopic images and other images. This method may beapplied to many fields such as photograph, imaging for public safetyinspection, medical imaging, microscopic imaging, etc.

Embodiments of the method of the disclosure presented as follows.

Embodiment 1: Denoising of Noisy Handwritten Numeral Images

The imaging system used in this embodiment is a handwritten numeralacquisition system.

Noisy images: noisy handwritten numeral images, with image resolution of28*28

Noiseless images: noiseless handwritten numeral images, with imageresolution of 28*28.

(1) A distribution constraint of noiseless handwritten numeral images isdetermined based on a distribution of noisy handwritten numeral imagesas follows.

(1-1) A probability density function ƒ_(Z) of a random vector Z fornoisy handwritten numeral images is estimated in an analytic form.

70,000 noisy handwritten numeral image samples are collected from thehandwritten numeral acquisition system, and the probability densityfunction ƒ_(Z) of the random vector Z of noisy handwritten numeralimages is obtained through frequency statistics of these 70,000 noisyimage samples.

(1-2) A probability density function ƒ_(N) of a random vector N foradditive noise contained in the handwritten numeral acquisition systemis estimated analytically:

Gaussian mixture distribution is used to fit the distribution of therandom vector N. First, assume that each pixel of the additive noiserandom vector obeys a five-dimensional Gaussian mixture distribution,noise samples are obtained from zero-signal pixels in the noisyhandwritten numeral images, and these noise samples are used to estimatethe parameters of five-dimensional Gaussian mixture distribution througha maximum likelihood method, thus the probability density function ƒ_(N)of the additive noise random vector N is obtained.

(1-3) Fourier transformation is performed on the probability densityfunction ƒ_(Z) and probability density function ƒ_(N) respectively toobtain the characteristic function F_(Z) or the random vector Z fornoisy handwritten numeral images and the characteristic function F_(N)of the random vector N for additive noise.

(1-4) a characteristic function F_(S) of a random vector S for noiselesshandwritten numeral images is obtained based on the characteristicfunction F_(Z) of the random vector Z for noisy handwritten numeralimages and the characteristic function F_(N) of the random vector N foradditive noise according to a formula of F_(S)=F_(Z)/F_(N).

(1-5) inverse Fourier transform is performed on the characteristicfunction F_(S) of the random vector S for noiseless handwritten numeralimages, to obtain the probability density function ƒ_(S) of the randomvector S, that is, the distribution of the random vector S for noiselesshandwritten numeral images.

(1-6) analytical filtering denoising is performed for a group of noisyhandwritten digital image samples z₁, z₂, z₃, . . . z_(n) that need tobe denoised by this method to obtain initial estimation of a set S forthe estimated noiseless handwritten digital image samples ŝ₁, ŝ₂, ŝ₃, .. . , ŝ_(n);

(1-7) A distribution distance D(Ŝ,S) between the distribution of the setŜ for noiseless handwritten numeral image samples and the distributionof the random vector S for noiseless handwritten numeral images iscalculated according to K-L distribution distance function D:

${D( {\overset{\hat{}}{S},S} )} = {\sum\limits_{i = 1}^{n}{\frac{1}{n}{( {{\ln\frac{1}{n}} - {f_{s}( {\overset{\hat{}}{s}}_{i} )}} ).}}}$

and the distribution distance D(Ŝ,S) is used as the distributionconstraint of noiseless handwritten numeral images for estimatingnoiseless images.

(2) Bayesian denoising is performed on noisy handwritten numeral imagesbased on the distribution constraint D(Ŝ,S) including the followingsteps:

(2-1) For the group of noisy handwritten numeral image samples z₁, z₂,z₃, . . . , z_(n) that need to be denoised by this method and thecorresponding estimated noiseless handwritten digital image samples ŝ₁,ŝ₂, ŝ₃, . . . , ŝ_(n), a Bayesian posterior cost function Ψ isconstructed:

${\psi( {{\overset{\hat{}}{s}}_{1},{\overset{\hat{}}{s}}_{2},{\overset{\hat{}}{s}}_{3},\ldots,{\overset{\hat{}}{s}}_{n}} )} = {{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}} + {\lambda{D( {\overset{\hat{}}{S},S} )}}}$

the first term φ_(f) of the Bayesian posterior cost function Ψ is a datafidelity term between the noisy handwritten numeral image samples z₁,z₂, z₃, . . . , z_(n) and estimated noiseless handwritten digital imagesamples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n), and the second term (Ŝ,S) is thedistribution constraint, λ is a hyperparameter adjusting the importanceof the distribution constraint.

(2-2) Gradient optimization algorithm is adopted to minimize theBayesian posterior cost function Ψ, to obtain the final estimatednoiseless handwritten numeral image sample ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n),realizing image denoising.

Embodiment 2: Nighttime Photograph Denoising

The imaging system used in this embodiment is: camera

Noisy images: noisy photograph patches with the image resolution of512*512

Noiseless images: Noiseless photograph patches, with image resolution of512*512

The resolution of the original noisy photograph is usually greater than512*512. Noisy photograph patches of 512*512 is extracted from theoriginal noisy photograph. Each noisy photograph patch is denoised toobtain an estimation of noiseless photograph patch, and then theestimated noiseless photograph may be obtained by concatenating togetherthe noisy photograph patches.

(1) A distribution constraint of noiseless photograph patches isdetermined based on a distribution of noisy photograph patches includingthe following steps.

(1-1) A set Z for noisy photograph patch samples z₁, z₂, z₃, . . . ,z_(n) is collected from the camera. Analytical filtering is performed ona set Z for noisy photograph patch samples z₁, z₂, z₃, . . . , z_(n)that need to be denoised by this method to obtain the initial estimationthe estimated noiseless photograph patch samples ŝ₁, ŝ₂, ŝ₃, . . . ,ŝ_(n).

(1-2) A probability density function ƒ_(N) of a random vector N foradditive noise in the camera system is estimated analytically.

A Gaussian mixture distribution is used to fit the distribution of therandom vector N. First, it is assumed that each pixel of the randomvector N obeys the five-dimensional Gaussian mixture distribution. Thenoise samples are measured when the camera lens is completely blocked,and the parameters of the 5-D Gaussian mixture distribution areestimated by the maximum likelihood method. The probability densityfunction ƒ_(N) of the random vector N is obtained.

(1-3) Simulated noise samples n₁*, n₂*, n₃*, . . . , n_(n)* arecollected from the distribution of the random vector N, in which thenumber of the simulated noise samples n₁*, n₂*, n₃*, . . . , n_(n)* isthe same as the number of noisy image samples.

(1-4) The noise samples n₁*, n₂*, n₃*, . . . , n_(n)* and the estimatednoiseless photograph patch samples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n) are addedto obtain a set {circumflex over (Z)} for estimated noisy photographpatch samples {circumflex over (z)}₁, {circumflex over (z)}₂,{circumflex over (z)}₃, . . . , {circumflex over (z)}_(n): {circumflexover (z)}_(i)=ŝ_(i)+n_(i)*, i=1:n.

The distribution distance D({circumflex over (Z)},Z) between thedistribution of the set {circumflex over (Z)} for estimated noisyphotograph patch samples and the distribution of the set Z for realnoisy photograph patch samples is calculated using the J-S distributiondistance function D.

Firstly, a probability density function ƒ_(Z) of the set Z for the noisyphotograph patch samples is estimated analytically. The distributionform of the of the set Z is assumed joint Gaussian distribution model:the set Z for noisy photograph patch samples is generated fromindependent standard Gaussian distribution of 500 dimensions throughlinear transformation of which the linear coefficients are denoted asθ_(L). The maximum likelihood method is used to calculate thedistribution parameter of the set Z for noisy photograph patch samples,i.e. the linear coefficients θ_(L). Thus, the probability densityfunction ƒ_(Z) of the set Z for noisy photograph patch samples isobtained.

Similarly, the probability density function ƒ_({circumflex over (Z)}) ofthe set {circumflex over (Z)} for the estimated noisy photograph patchsamples is estimated analytically.

The distribution distance between the distribution of set {circumflexover (Z)} estimated noisy photograph patch samples and the distributionof set Z for noisy photograph patch samples is calculated with K-Ldistribution distance function D:

${D( {\overset{\hat{}}{Z},Z} )} = {\lbrack {{\sum\limits_{i = 1}^{n}{\frac{1}{n}( {{\ln\frac{1}{n}} - {f_{z}( {\overset{\hat{}}{z}}_{i} )}} )}} + {\sum\limits_{i = 1}^{n}{\frac{1}{n}( {{\ln\frac{1}{n}} - {f_{z}( z_{i} )}} )}}} \rbrack/2}$

D({circumflex over (Z)},Z) is defined as the distribution constraint ofnoiseless photograph patches.

(2) According to the distribution constraint of noiseless photographpatches, Bayesian denoising is performed on the noisy photographpatches.

(2-1) A Bayesian posterior cost function Ψ is constructed for this groupof noisy photograph patch samples z₁, z₂, z₃, . . . , z_(n) that need tobe denoised by this method and corresponding estimated noiselessphotograph patch samples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n):

${\psi( {{\overset{\hat{}}{s}}_{1},{\overset{\hat{}}{s}}_{2},{\overset{\hat{}}{s}}_{3},\ldots,{\overset{\hat{}}{s}}_{n}} )} = {{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}} + {\lambda{D( {\overset{\hat{}}{Z},Z} )}}}$

the first term φ_(f) of the Bayesian posterior cost function Ψ is thedata fidelity term between z₁, z₂, z₃, . . . , z_(n) and ŝ₁, ŝ₂, ŝ₃, . .. , ŝ_(n), and the second term D({circumflex over (Z)},Z) is thedistribution constraint, λ is a hyperparameter adjusting the importanceof the distance constraint D({circumflex over (Z)},Z).

(2-2) Neural network optimization method is adopted for the Bayesianposterior cost function Ψ to obtain the final estimated noiselessphotograph patches ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n), realizing image denoising:

An image denoising neural network Ω_(E) is constructed as an imagedenoising operator ŝ_(i)≡Ω_(E)(z_(i); θE), i=1:n, where θ_(E) is theparameter set of the image denoising neural network; the Bayesianposterior cost function Ψ is taken as the training cost function of theimage denoising neural network, and a large noisy image sample datasetis used to train the image denoising neural network:

${\min\limits_{\theta_{E}}{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}}} + {\lambda{{D( {\overset{\hat{}}{Z},Z} )}.}}$

During inference, the noisy photograph patches is taken as the input ofthe image denoising neural network, and the estimated noiselessphotograph patch is obtained from the output. This noiseless photographpatches are the final estimated noiseless photograph patches.

The image denoising neural network and its cost function are shown inFIG. 2 , in which the specific structure of the image denoising neuralnetwork is shown in FIG. 3(a).

Embodiment 3: Denoising of X-Ray CT Projections

The imaging system used in this embodiment is X-ray CT.

Noisy images: noisy projections, with the resolution of 1024*720

Noiseless images: noiseless projections with the resolution of 1024*720

(1) A distance constraint of the noiseless projections is calculatedbased on a noisy projection distribution. The specific steps are asfollows.

(1-1) A group of noisy projection samples z₁, z₂, z₃, . . . z_(n) iscollected from a X-ray CT system. Bayesian maximum-a-posterior iterativedenoising is performed on the group of noisy projection samples z₁, z₂,z₃, . . . , z_(n) that need to be denoised by this method, and initialestimation of estimated noiseless projection samples ŝ₁, ŝ₂, ŝ₃, . . . ,ŝ_(n) is obtained.

(1-2) The noise contained in the noisy projection samples is notadditive. Exponential transformation and generalized Anscombetransformation are performed on the noisy projection samples z₁, z₂, z₃,. . . z_(n) to obtain the noisy Anscombe projection samples z_(A1),z_(A2), z_(A3), . . . , z_(An):

${z_{Ai} = {\frac{2}{\gamma} \times \sqrt{{\gamma I_{0}{\exp( {- z_{i}} )}} + {0.375\gamma^{2}} + \sigma_{e}^{2}}}},{i = {1:n}}$

where, I₀ is the value of X-ray CT air-value signal, γ is thecontribution of each photon received by the X-ray CT detector to theelectrical signal, and σ_(e) ² is the variance of electronic noise. I₀,γ, σ_(e) ² are measured by actual experiments.

(1-3) Assume that the random vector for additive noise contained in thenoisy Anscombe projection samples z_(A1), z_(A2), z_(A3), . . . , z_(An)is N. According to physical laws, every pixel obeys the standardGaussian distribution, probability density function ƒ_(N) of the randomvector N is the independent standard Gaussian distribution probabilitydensity function of 1024*720 dimensions. Simulated noise samples n₁*,n₂*, n₃*, . . . n_(n)* are collected from the distribution of randomvector N, in which the number of the simulated noise samples n₁*, n₂*,n₃*, . . . n_(n)* is the same as the number of noisy Anscombe projectionsamples z_(A1), z_(A2), z_(A3), . . . , z_(An).

(1-4) Exponential transformation and generalized Anscombe transformationare performed on the estimated noiseless projection samples ŝ₁, ŝ₂, ŝ₃,. . . , ŝ_(n) to obtain estimated noiseless Anscombe projection samplesŝ_(A1), ŝ_(A2), ŝ_(A3), . . . , ŝ_(An). The noise samples n₁*, n₂*, n₃*,. . . , n_(n)* and the estimated noiseless Anscombe projection samplesŝ_(A1), ŝ_(A2), ŝ_(A3), . . . , ŝ_(An) are added to obtain the estimatednoisy Anscombe projection samples {circumflex over (z)}_(A1),{circumflex over (z)}_(A2), {circumflex over (z)}_(A3), . . . ,{circumflex over (z)}_(An), in which {circumflex over(z)}_(Ai)=ŝ_(Ai)+n_(i)*, i=1:n.

(1-5) The set for noisy Anscombe projection samples z_(A1), z_(A2),z_(A3), . . . z_(An) is denoted as Z_(A), the set for estimated noisyAnscombe projection samples is denoted) as {circumflex over (Z)}_(A),and the distribution distance D({circumflex over (Z)}_(A),Z_(A)) betweenthe distribution of set {circumflex over (Z)}_(A) for estimated noisyAnscombe projection samples and the distribution of set Z_(A) of realnoisy Anscombe projection samples is calculated using the neural networkdiscriminator distance function D as follows.

Assume the distribution distance function D is set to include a neuralnetwork discriminator Ω_(D) as the distribution distance operator, andthe parameter set of the neural network discriminator Ω_(D) is θ_(D).The set Z_(A) for noisy Anscombe projection samples and the set{circumflex over (Z)}_(A) for estimated noisy Anscombe projectionsamples are inputted into the distribution distance function, and theneural network discriminator Ω_(D) is used to calculate the crossentropy distance:

${{D( {{\overset{\hat{}}{Z}}_{A},Z_{A}} )} = {\sum\limits_{i = 1}^{n}\lbrack {{\log( {\Omega_{D}( {z_{Ai};\theta_{D}} )} )} + {\log( {1 - {\Omega_{D}( {{\overset{\hat{}}{z}}_{Ai};\theta_{D}} )}} }} \rbrack}}.$

Define D({circumflex over (Z)}_(A),Z_(A)) the distribution constraint ofnoiseless projections.

(2) According to the distribution constraint of noiseless projections,Bayesian denoising is performed on the noisy projections.

(2-1) A Bayesian posterior cost function Ψ is constructed for the groupof noisy projection samples that need to be denoised by this method andthe corresponding estimated noiseless projection samples:

${\psi( {{\overset{\hat{}}{s}}_{1},{\overset{\hat{}}{s}}_{2},{\overset{\hat{}}{s}}_{3},\ldots,{\overset{\hat{}}{s}}_{n}} )} = {{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}} + {\lambda{D( {{\overset{\hat{}}{Z}}_{A},Z_{A}} )}}}$

the first term φ_(f) of the Bayesian posterior cost function Ψ is thedata fidelity term between z₁, z₂, z₃, . . . , z_(n) and ŝ₁, ŝ₂, Ŝ₃, . .. , ŝ_(n) and the second term D({circumflex over (Z)}_(A),Z_(A)) is thedistribution constraint, λ is the a hyperparameter adjusting theimportance of the distribution constraint.

(2-2) The Bayesian posterior cost function Ψ is optimized using theneural network optimization method to obtain the final estimatednoiseless projection samples ŝ₁, ŝ₂, ŝ₃, . . . ŝ_(n), realizingprojection denoising.

An image denoising neural network Ω_(E) is constructed as an imagedenoising operator ŝ_(i)≡Ω_(E)(z_(i); θ_(E)), i=1:n where θ_(E) is theparameter set of the image denoising neural network. The Bayesianposterior cost function Ψ is used to train the image denoising neuralnetwork, and a dataset of larger number of noisy projections is used totrain the noisy denoising neural network Ω_(E) and the neural networkdiscriminator Ω_(D) in the distribution constrain:

${\min\limits_{\theta_{E}}{\max\limits_{\theta_{D}}{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}}}} + {\lambda{{D( {{\overset{\hat{}}{Z}}_{A},Z_{A}} )}.}}$

During inference, noisy projection samples are taken as the input of theimage denoising neural network, and final estimated noiseless projectionsamples are obtained from the output.

The image denoising neural network, neural network discriminator andtheir cost functions are shown in FIG. 4 , in which the specificstructure of the image denoising neural network is shown in FIG. 3(a),and the specific structure of the neural network discriminator is shownin FIG. 3(b).

The Bayesian image denoising method based on the distribution constraintfrom noisy images proposed in the present disclosure has the followingadvantages.

1. With the method of the present disclosure, when the distributionmodel of additive noise independent of the signal is known, thedistribution estimation of noiseless images may be realized indirectlythrough the distribution estimation of noisy images. This distributionconstraint transformation avoids the need of noiseless images intraditional distribution modeling methods such as dictionary learning,which makes noiseless image distribution modeling easier to implement.

2. The distribution constraints involved in the method of the presentdisclosure are derived from datasets rather than human settings, so ithas flexibility and data specificity. Compared with the manuallydesigned constraint in the traditional Bayesian estimation method, thedistribution constraint of the present method is more accurate and canestimate the noiseless image more accurately.

3. With the method of the present disclosure, the distributionconstraint and Bayesian estimation process can be implemented withneural network, which makes full use of the powerful image processingand characterization ability of the neural network, so that thedistribution constraint of noiseless images is easy to be implemented.Moreover, after the completion of the training, the network can denoisethe image in an end-to-end and analytical way, thus greatly reducing thecomputation amount and computation time, compared with the iterativeprocess of the iterative Bayesian estimation method.

Reference throughout this specification to “an embodiment,” “someembodiments,” “an example,” “a specific example,” or “some examples,”means that a particular feature, structure, material, or characteristicdescribed in connection with the embodiment or example is included in atleast one embodiment or example of the present disclosure. Theappearances of the above phrases in various places throughout thisspecification are not necessarily referring to the same embodiment orexample of the present disclosure. Furthermore, the particular features,structures, materials, or characteristics may be combined in anysuitable manner in one or more embodiments or examples. In addition,different embodiments or examples and features of different embodimentsor examples described in the specification may be combined by thoseskilled in the art without mutual contradiction.

Although embodiments of present disclosure have been shown and describedabove, it should be understood that above embodiments are justexplanatory, and cannot be construed to limit the present disclosure,for those skilled in the art, changes, alternatives, and modificationscan be made to the embodiments without departing from spirit, principlesand scope of the present disclosure.

What is claimed is:
 1. A Bayesian image denoising method based on adistribution constraint of noiseless images from a distribution of noisyimages, comprising: determining a distribution constraint of noiselessimages from a distribution of noisy images; and performing Bayesiandenoising on noisy images based on the distribution constraint ofnoiseless images, wherein determining the distribution constraint ofnoiseless images from the distribution of noisy images comprises: forimages of which the distribution is difficult to be expressedanalytically, or for images of which the distribution is too complex,estimating the distribution constraint of noiseless images by:performing analytical filtering or Bayesian iterative denoising on agroup of noisy image samples z₁, z₂, z₃, . . . , z_(n) to obtain initialestimation of corresponding noiseless image samples ŝ₁, ŝ₂, ŝ₃, . . . ,ŝ_(n); estimating a probability density function ƒ_(N) of a randomvector N for additive noise of the imaging system analytically, in whichnoise samples n₁*, n₂*, n₃*, . . . , n_(n)* are sampled from thedistribution of the random vector N, and the number of the noise samplesn₁*, n₂*, n₃*, . . . , n_(n)* is the same as that of the noisy imagesamples z₁, z₂, z₃, . . . , z_(n); adding the noise samples n₁*, n₂*,n₃*, . . . , n_(n)* and the estimated noiseless image samples ŝ₁, ŝ₂,ŝ₃, . . . , ŝ_(n) to generate estimated noisy image samples {circumflexover (z)}₁, {circumflex over (z)}₂, {circumflex over (z)}₃, . . . ,{circumflex over (z)}_(n), in which {circumflex over(z)}_(i)=ŝ_(i)+n_(i)*, i=1:n; calculating a distribution distanceD({circumflex over (Z)},Z) between a set Z for the estimated noisy imagesamples {circumflex over (z)}₁, {circumflex over (z)}₂, {circumflex over(z)}₃, . . . , {circumflex over (z)}_(n) and a set Z for the noisy imagesamples z₁, z₂, z₃, . . . , z_(n) according to a distribution distancefunction D, and determining the distribution distance D({circumflex over(Z)},Z) as the distribution constraint for noiseless images.
 2. Themethod of claim 1, wherein estimating a probability density functionƒ_(N) of a random vector N for additive noise of the imaging systemanalytically comprises: obtaining a distribution form of the randomvector N for additive noise according to physical laws of the imagingsystem, and obtaining the probability density function ƒ_(N) of therandom vector N by measuring distribution parameters of the randomvector N through simulation or actual experiments.
 3. The method ofclaim 1, wherein estimating a probability density function ƒ_(N) of arandom vector N for additive noise of the imaging system analyticallycomprises: fitting unknown distribution with Gaussian mixturedistribution; wherein, fitting the unknown distribution with Gaussianmixture distribution comprises: assuming that the random vector N foradditive noise obeys a Gaussian mixture distribution, and then obtainingthe probability density function ƒ_(N) of the random vector N foradditive noise by measuring Gaussian mixture distribution parametersthrough simulation or actual experiments.
 4. The method of claim 1,wherein estimating a probability density function ƒ_(N) of a randomvector N for additive noise of the imaging system analytically furthercomprises: transforming a distribution of non-additive noise into anapproximate additive noise through a specific mathematicaltransformation.
 5. The method of claim 1, wherein performing Bayesiandenoising on noisy images based on the distribution constraint ofnoiseless images comprises: constructing a Bayesian posterior costfunction Ψ for the noisy image samples z₁, z₂, z₃, . . . , z_(n) and theestimated noiseless image samples ŝ₁, ŝ₂, ŝ₃, . . . , ŝ_(n)${\psi( {{\overset{\hat{}}{s}}_{1},{\overset{\hat{}}{s}}_{2},{\overset{\hat{}}{s}}_{3},\ldots,{\overset{\hat{}}{s}}_{n}} )} = {{\sum\limits_{i = {1:n}}{\varphi_{f}( {z_{i} - {\overset{\hat{}}{s}}_{i}} )}} + {\lambda D}}$where, the first term ϕ_(f) of the Bayesian posterior cost function Ψ isa data fidelity term between the noisy image samples z₁, z₂, z₃, . . . ,z_(n) and the estimated noiseless image samples ŝ₁, ŝ₂, ŝ₃, . . . ,ŝ_(n), the second term D is the distribution constraint, that is,distribution distance D(Ŝ,S) or the distribution distance D({circumflexover (Z)},Z), λ is a hyperparameter adjusting the importance of thedistribution constraint; and minimizing the Bayesian posterior costfunction Ψ and obtaining final estimated noiseless images ŝ₁, ŝ₂, ŝ₃, .. . , ŝ_(n).
 6. The method of claim 5, wherein the minimizing theBayesian posterior cost function Ψ adopts a classical mathematicaloptimization method.
 7. The method of claim 6, wherein the classicalmathematical optimization method comprises any one of a Lagrangemultiplier method, an alternate iteration method, an alternativeconjugate least square method or a gradient optimization algorithm. 8.The method of claim 5, wherein minimizing the Bayesian posterior costfunction Ψ adopts a neural network optimization method, comprising:constructing an image denoising neural network Ω_(E) as an imagedenoising operator ŝ_(i)≡Ω_(E)(z_(i); θ_(E)), i=1:n, where θ_(E) is theparameter set of the image denoising neural network; taking the Bayesianposterior cost function Ψ as the cost function to train the imagedenoising neural network, and training the image denoising neuralnetwork using a dataset with a large number of noisy image samples; andinputting the noisy image samples into the image denoising neuralnetwork Ω_(E), and acquiring the estimated noiseless images from networkoutputs as Bayesian denoising results.
 9. The method of claim 1, whereincalculating a distribution distance D({circumflex over (Z)},Z) between aset {circumflex over (Z)} for the estimated noisy image samples{circumflex over (z)}₁, {circumflex over (z)}₂, {circumflex over (z)}₃,. . . , {circumflex over (z)}_(n) and a set Z for the noisy imagesamples z₁, z₂, z₃, . . . , z_(n) according to a distribution distancefunction D, comprises: assuming that the distribution distance functionD contains a neural network discriminator Ω_(D) as a distributiondistance operator, in which the neural network discriminator Ω_(D) has aparameter set θ_(D); and using the distribution of the set {circumflexover (Z)} for estimated noisy image samples and the distribution of theset Z for real noisy image samples to compute distribution distanceD({circumflex over (Z)},Z), where the distribution distance function Dis defined by a cross entropy distance function, which is realized bythe neural network discriminator Ω_(D):${D( {\overset{\hat{}}{Z},Z} )} = {{\sum\limits_{i = {1:n}}{\log( {\Omega_{D}( {z_{i};\theta_{D}} )} )}} + {\log( {1 - {\Omega_{D}( {{\overset{\hat{}}{z}}_{i};\theta_{D}} )}} )}}$or a Wasserstein distance function:${D( {\overset{\hat{}}{Z},Z} )} = {\sum\limits_{i = {1:n}}{{❘{{\Omega_{D}( {z_{i};\theta_{D}} )} - {\Omega_{D}( {{\overset{\hat{}}{z}}_{i};\theta_{D}} )}}❘}.}}$10. The method of claim 1, wherein calculating a distribution distanceD({circumflex over (Z)},Z) between a set {circumflex over (Z)} for theestimated noisy image samples {circumflex over (z)}₁, {circumflex over(z)}₂, {circumflex over (z)}₃, . . . , {circumflex over (z)}_(n) and aset Z for the noisy image samples z₁, z₂, z₃, . . . , z_(n) according toa distribution distance function D, comprises: assuming that thedistribution distance function D contains a neural network discriminatorΩ_(D) as a distribution distance operator, in which the neural networkdiscriminator Ω_(D) has a parameter set θ_(D); and using thedistribution of the set {circumflex over (Z)} for estimated noisy imagesamples and the distribution of the set Z for real noisy image samplesto compute distribution distance D({circumflex over (Z)},Z) where thedistribution distance function D is defined by a distribution distancefunction in generative adversarial network methods in the generativeadversarial network field.
 11. The method of claim 10, wherein thedistribution distance function in the generative adversarial networkmethods in the generative adversarial network field comprises thedistribution distance function of any one of Standard GAN (SGAN),Region-Separative GAN (RSGAN), Relativistic average Standard GAN(RaSGAN), Least Squares GAN (LSGAN) Relativistic average Least SquaresGAN (RaLSGAN), Hinge-loss GAN (HingeGAN), Wasserstein GAN with GradientPenalty (WGAN-GP), Region-Separative GAN with Gradient Penalty(RSGAN-GP) or Relativistic average Standard GAN with Gradient Penalty(RaSGAN-GP).